NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications

Siampoi, Maria Despoina; Li, Jialiang; Krumm, John; Shahabi, Cyrus (July 2025, Inernational Conference on Machine Learning)

Encoding geospatial objects is fundamental for geospatial artificial intelligence (GeoAI) applications, which leverage machine learning (ML) models to analyze spatial information. Common approaches transform each object into known formats, like image and text, for compatibility with ML models. However, this process often discards crucial spatial information, such as the object’s position relative to the entire space, reducing downstream task effectiveness. Alternative encoding methods that preserve some spatial properties are often devised for specific data objects (e.g., point encoders), making them unsuitable for tasks that involve different data types (i.e., points, polylines, and polygons). To this end, we propose POLY2VEC, a polymorphic Fourier-based encoding approach that unifies the representation of geospatial objects, while preserving the essential spatial properties. POLY2VEC incorporates a learned fusion module that adaptively integrates the magnitude and phase of the Fourier transform for different tasks and geometries. We evaluate POLY2VEC on five diverse tasks, organized into two categories. The first empirically demonstrates that POLY2VEC consistently outperforms objectspecific baselines in preserving three key spatial relationships: topology, direction, and distance. The second shows that integrating POLY2VEC into a state-of-the-art GeoAI workflow improves the performance in two popular tasks: population prediction and land use inference.
more » « less
Free, publicly-accessible full text available July 13, 2026
TrajGPT: Controlled Synthetic Trajectory Generation Using a Multitask Transformer-Based Spatiotemporal Model

Hsu, Shang-Ling; Tung, Emmanuel; Krumm, John; Shahabi, Cyrus; Shafique, Khurram (October 2024, ACM Digital Library)

Human mobility modeling from GPS-trajectories and synthetic trajectory generation are crucial for various applications, such as urban planning, disaster management and epidemiology. Both of these tasks often require filling gaps in a partially specified sequence of visits, – a new problem that we call “controlled” synthetic trajectory generation. Existing methods for next-location prediction or synthetic trajectory generation cannot solve this problem as they lack the mechanisms needed to constrain the generated sequences of visits. Moreover, existing approaches (1) frequently treat space and time as independent factors, an assumption that fails to hold true in real-world scenarios, and (2) suffer from challenges in accuracy of temporal prediction as they fail to deal with mixed distributions and the inter-relationships of different modes with latent variables (e.g., day-of-the-week). These limitations become even more pronounced when the task involves filling gaps within sequences instead of solely predicting the next visit. We introduce TrajGPT, a transformer-based, multi-task, joint spatiotemporal generative model to address these issues. Taking inspiration from large language models, TrajGPT poses the problem of controlled trajectory generation as that of text infilling in natural language. TrajGPT integrates the spatial and temporal models in a transformer architecture through a Bayesian probability model that ensures that the gaps in a visit sequence are filled in a spatiotemporally consistent manner. Our experiments on public and private datasets demonstrate that TrajGPT not only excels in controlled synthetic visit generation but also outperforms competing models in next-location prediction tasks–Relatively, TrajGPT achieves a 26-fold improvement in temporal accuracy while retaining more than 98% of spatial accuracy on average.
more » « less
Full Text Available
Geo-Llama: Leveraging LLMs for Human Mobility Trajectory Generation with Constraints

https://doi.org/10.1109/MDM65600.2025.00023

Li, Siyu; Tran, Toan; Lin, Haowen; Krumm, John; Shahabi, Cyrus; Zhao, Lingyi; Shafique, Khurram; Xiong, Li (June 2025, IEEE)

Free, publicly-accessible full text available June 2, 2026
HTF: Homogeneous Tree Framework for Differentially-Private Release of Large Geospatial Datasets with Self-Tuning Structure Height

https://doi.org/10.1145/3569087

Shaham, Sina; Ghinita, Gabriel; Ahuja, Ritesh; Krumm, John; Shahabi, Cyrus (October 2022, ACM Transactions on Spatial Algorithms and Systems)

Mobile apps that use location data are pervasive, spanning domains such as transportation, urban planning and healthcare. Important use cases for location data rely on statistical queries, e.g., identifying hotspots where users work and travel. Such queries can be answered efficiently by building histograms. However, precise histograms can expose sensitive details about individual users. Differential privacy (DP) is a mature and widely-adopted protection model, but most approaches for DP-compliant histograms work in a data-independent fashion, leading to poor accuracy. The few proposed data-dependent techniques attempt to adjust histogram partitions based on dataset characteristics, but they do not perform well due to the addition of noise required to achieve DP. In addition, they use ad-hoc criteria to decide the depth of the partitioning. We identifydensity homogeneityas a main factor driving the accuracy of DP-compliant histograms, and we build a data structure that splits the space such that data density is homogeneous within each resulting partition. We propose a self-tuning approach to decide the depth of the partitioning structure that optimizes the use of privacy budget. Furthermore, we provide an optimization that scales the proposed split approach to large datasets while maintaining accuracy. We show through extensive experiments on large-scale real-world data that the proposed approach achieves superior accuracy compared to existing approaches.
more » « less
Full Text Available
HTF - Homogeneous Tree Framework for Differentially-Private Release of Location Data

Shaham, Sina; Ghinita, Gabriel; Ahuja, Ritesh; Krumm, John; Shahabi, Cyrus (November 2021, ACM SIGSPATIAL)
null (Ed.)
Full Text Available
Mobility Data Science: Perspectives and Challenges

https://doi.org/10.1145/3652158

Mokbel, Mohamed; Sakr, Mahmoud; Xiong, Li; Züfle, Andreas; Almeida, Jussara; Anderson, Taylor; Aref, Walid; Andrienko, Gennady; Andrienko, Natalia; Cao, Yang; et al (June 2024, ACM Transactions on Spatial Algorithms and Systems)

Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of Global Positioning System (GPS)–equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated a significant impact in various domains, including traffic management, urban planning, and health sciences. In this article, we present the domain of mobility data science. Towards a unified approach to mobility data science, we present a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state-of-the-art, and describe open challenges for the research community in the coming years.
more » « less
Full Text Available
Estimating spread of contact-based contagions in a population through sub-sampling

https://doi.org/10.14778/3461535.3461544

Zeighami, Sepanta; Shahabi, Cyrus; Krumm, John (May 2021, Proceedings of the VLDB Endowment)

Various phenomena such as viruses, gossips, and physical objects (e.g., packages and marketing pamphlets) can be spread through physical contacts. The spread depends on how people move, i.e., their mobility patterns. In practice, mobility patterns of an entire population is never available, and we usually have access to location data of a subset of individuals. In this paper, we formalize and study the problem of estimating the spread of a phenomena in a population, given that we only have access to sub-samples of location visits of some individuals in the population. We show that simple solutions that estimate the spread in the sub-sample and scale it to the population, or more sophisticated solutions that rely on modeling location visits of individuals do not perform well in practice. Instead, we directly model the co-locations between the individuals. We introduce PollSpreader and PollSusceptible, two novel approaches that model the co-locations between individuals using a contact network , and infer the properties of the contact network using the sub-sample to estimate the spread of the phenomena in the entire population. We analytically show that our estimates provide an upper bound and a lower bound on the spread of the disease in expectation. Finally, using a large high-resolution real-world mobility dataset, we experimentally show that our estimates are accurate in practice, while other methods that do not correctly account for co-locations between individuals result in entirely wrong observations (e.g, premature prediction of herd-immunity).
more » « less
Full Text Available
Spatial Privacy Pricing: The Interplay between Privacy, Utility and Price in Geo-Marketplaces

https://doi.org/10.1145/3397536.3422213

Nguyen, Kien; Krumm, John; Shahabi, Cyrus (November 2020, Proceedings of the 28th International Conference on Advances in Geographic Information Systems)
null (Ed.)
Full Text Available
Spatial Privacy Pricing: The Interplay between Privacy, Utility and Price in Geo-Marketplaces

Nguyen, Kien; Krumm, John; Shahabi, Cyrus (November 2020, SIGSPATIAL '20: Proceedings of the 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems)

A geo-marketplace allows users to be paid for their location data. Users concerned about privacy may want to charge more for data that pinpoints their location accurately, but may charge less for data that is more vague. A buyer would prefer to minimize data costs, but may have to spend more to get the necessary level of accuracy. We call this interplay between privacy, utility, and price spatial privacy pricing. We formalize the issues mathematically with an example problem of a buyer deciding whether or not to open a restaurant by purchasing location data to determine if the potential number of customers is sufficient to open. The problem is expressed as a sequential decision making problem, where the buyer first makes a series of decisions about which data to buy and concludes with a decision about opening the restaurant or not. We present two algorithms to solve this problem, including experiments that show they perform better than baselines.
more » « less
Full Text Available

Search for: All records